AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Neural Information Processing SystemsFeb-17-2026, 04:22:36 GMT

a45296e83b19f656392e0130d9e53cb1-Paper-Conference.pdf

large language model, machine learning, natural language, (21 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > Montserrat (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

arXiv.org Artificial IntelligenceOct-31-2025

GenIR: Generative Visual Feedback for Mental Image Retrieval

Yang, Diji, Liu, Minghao, Lo, Chung-Hsiang, Zhang, Yi, Davis, James

Vision-language models (VLMs) have shown strong performance on text-to-image retrieval benchmarks. However, bridging this success to real-world applications remains a challenge. In practice, human search behavior is rarely a one-shot action. Instead, it is often a multi-round process guided by clues in mind. That is, a mental image ranging from vague recollections to vivid mental representations of the target image. Motivated by this gap, we study the task of Mental Image Retrieval (MIR), which targets the realistic yet underexplored setting where users refine their search for a mentally envisioned image through multi-round interactions with an image search engine. Central to successful interactive retrieval is the capability of machines to provide users with clear, actionable feedback; however, existing methods rely on indirect or abstract verbal feedback, which can be ambiguous, misleading, or ineffective for users to refine the query. To overcome this, we propose GenIR, a generative multi-round retrieval paradigm leveraging diffusion-based image generation to explicitly reify the AI system's understanding at each round. These synthetic visual representations provide clear, interpretable feedback, enabling users to refine their queries intuitively and effectively. We further introduce a fully automated pipeline to generate a high-quality multi-round MIR dataset. Experimental results demonstrate that GenIR significantly outperforms existing interactive methods in the MIR scenario. This work establishes a new task with a dataset and an effective generative retrieval method, providing a foundation for future research in this direction

information retrieval, machine learning, natural language, (19 more...)

2506.0622

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.46)
Health & Medicine (0.46)
Energy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)

Neural Information Processing SystemsOct-10-2025, 12:08:36 GMT

Mind Eye of LLMs: Visualization of Thought Elicits

However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored.

instruction, reasoning, visualization, (17 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > Montserrat (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

arXiv.org Artificial IntelligenceJul-22-2025

Can Mental Imagery Improve the Thinking Capabilities of AI Systems?

Larabi, Slimane

Although existing models can interact with humans and provide satisfactory responses, they lack the ability to act autonomously or engage in independent reasoning. Furthermore, input data in these models is typically provided as explicit queries, even when some sensory data is already acquired. In addition, AI agents, which are computational entities designed to perform tasks and make decisions autonomously based on their programming, data inputs, and learned knowledge, have shown significant progress. However, they struggle with integrating knowledge across multiple domains, unlike humans. Mental imagery plays a fundamental role in the brain's thinking process, which involves performing tasks based on internal multisensory data, planned actions, needs, and reasoning capabilities. In this paper, we investigate how to integrate mental imagery into a machine thinking framework and how this could be beneficial in initiating the thinking process. Our proposed machine thinking framework integrates a Cognitive thinking unit supported by three auxiliary units: the Input Data Unit, the Needs Unit, and the Mental Imagery Unit. Within this framework, data is represented as natural language sentences or drawn sketches, serving both informative and decision-making purposes. We conducted validation tests for this framework, and the results are presented and discussed.

large language model, machine learning, natural language, (19 more...)

2507.12555

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Kneeland, Reese, Scotti, Paul S., St-Yves, Ghislain, Breedlove, Jesse, Kay, Kendrick, Naselaris, Thomas

NSD-Imagery: A benchmark dataset for extending fMRI vision decoding methods to mental imagery

arXiv.org Artificial IntelligenceJun-10-2025

We release NSD-Imagery, a benchmark dataset of human fMRI activity paired with mental images, to complement the existing Natural Scenes Dataset (NSD), a large-scale dataset of fMRI activity paired with seen images that enabled unprecedented improvements in fMRI-to-image reconstruction efforts. Recent models trained on NSD have been evaluated only on seen image reconstruction. Using NSD-Imagery, it is possible to assess how well these models perform on mental image reconstruction. This is a challenging generalization requirement because mental images are encoded in human brain activity with relatively lower signal-to-noise and spatial resolution; however, generalization from seen to mental imagery is critical for real-world applications in medical domains and brain-computer interfaces, where the desired information is always internally generated. We provide benchmarks for a suite of recent NSD-trained open-source visual decoding models (MindEye1, MindEye2, Brain Diffuser, iCNN, Takagi et al.) on NSD-Imagery, and show that the performance of decoding methods on mental images is largely decoupled from performance on vision reconstruction. We further demonstrate that architectural choices significantly impact cross-decoding performance: models employing simple linear decoding architectures and multimodal feature decoding generalize better to mental imagery, while complex architectures tend to overfit visual training data. Our findings indicate that mental imagery datasets are critical for the development of practical applications, and establish NSD-Imagery as a useful resource for better aligning visual decoding methods with this goal.

artificial intelligence, machine learning, reconstruction, (19 more...)

2506.06898

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsMay-27-2025, 11:27:15 GMT

Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

language model, llm, visualization-of-thought elicit spatial reasoning, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceMay-24-2024

Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

Wu, Wenshan, Mao, Shaoguang, Zhang, Yadong, Xia, Yan, Dong, Li, Cui, Lei, Wei, Furu

Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks. However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored. Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind's Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. VoT aims to elicit spatial reasoning of LLMs by visualizing their reasoning traces, thereby guiding subsequent reasoning steps. We employed VoT for multi-hop spatial reasoning tasks, including natural language navigation, visual navigation, and visual tiling in 2D grid worlds. Experimental results demonstrated that VoT significantly enhances the spatial reasoning abilities of LLMs. Notably, VoT outperformed existing multimodal large language models (MLLMs) in these tasks. While VoT works surprisingly well on LLMs, the ability to generate mental images to facilitate spatial reasoning resembles the mind's eye process, suggesting its potential viability in MLLMs.

instruction, reasoning, visualization, (17 more...)

2404.03622

Country:

Asia > Middle East > Jordan (0.04)
North America > Montserrat (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Monaco (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

The Japan TimesDec-16-2023, 01:01:00 GMT

Japan scientists create world's first mental images with AI tech

Japanese scientists said they have succeeded in creating the world's first mental images of objects and landscapes from human brain activity by using artificial intelligence technology. The team of scientists from the National Institutes for Quantum Science and Technology, another national institute and Osaka University was able to produce rough images of a leopard, with a recognizable mouth, ears and spotted pattern, as well as objects like an airplane with red lights on its wings. The technology, dubbed "brain decoding," enables the visualization of perceptual contents based on brain activity and could be applied to the medical and welfare fields. The findings were recently published online in the international scientific journal Neural Networks. Previous studies had shown that images seen by human participants could be reconstructed from brain activity measured using functional magnetic resonance imaging, or fMRI, although they were limited to specific domains such as alphabetical letters.

brain activity, japan scientist create world, mental image, (5 more...)

The Japan Times

Country: Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.28)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.97)
Health & Medicine > Health Care Technology (0.83)
Health & Medicine > Diagnostic Medicine > Imaging (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)

arXiv.org Artificial IntelligenceOct-30-2023

A Novel Representation to Improve Team Problem Solving in Real-Time

Doboli, Alex

This paper proposes a novel representation to support computing metrics that help understanding and improving in real-time a team's behavior during problem solving in real-life. Even though teams are important in modern activities, there is little computing aid to improve their activity. The representation captures the different mental images developed, enhanced, and utilized during solving. A case study illustrates the representation.

mental image, representation, requirement, (15 more...)

2310.19539

Country: North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Architecture > Real Time Systems (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)